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AN  APPROACH  TO  THE  PROGRAMMING  OF  BIASED 
REGRESSION  ALGORITHMS 
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Dallas,  Texas  75275 


ABSTRACT 

Due  to  the  near  nonexistence  of  computer  algorithms  for  cal- 
culating estimators  and  ancillary  statistics  that  are  needed  for 
biased  regression  methodologies,  many  users  of  these  methodologies 
are  forced  to  write  their  own  programs.  Brute-force  coding  of 
such  programs  can  result  in  a great  waste  of  computer  core  and 
computing  time,  as  well  as  inefficient  and  inaccurate  computing 
techniques.  This  article  proposes  some  guides  to  more  efficient 
programming  by  taking  advantage  of  mathematical  similarities 
among  several  of  the  more  popular  biased  regression  estimators. 

1.  INTRODUCTION 

Regression  data  analysts  currently  face  a serious  computing 

problem  in  their  efforts  to  utilize  biased  regression  techniques. 

On  the  one  hand,  there  is  a vast  amount  of  evidence  in  scientific 

publications  that  biased  regression  procedures  are  preferable  to 

ordinary  least  squares  estimation  when  the  predictor  variables  are 

multicollinear  (e.g.,  Dempster,  Schatzoff,  and  Wermuth  (1977)  and 
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Gunst  and  Mason  (1977b)).  Ridge  Regression  (Hoerl  and  Kennard 
(1970)),  Principal  Component  Regression  (Massy  (1965) f Marquardt 
(1970)),  Latent  Root  Regression  (Hawkins  (1973),  Webster,  Gunst, 
and  Mason  (1974)),  and  Shrunken  Estimators  (James  and  Stein  (1961), 
Mayer  and  Willke  (1973) ) encompass  a wide  variety  of  popular  biased 
regression  methodologies  that  have  been  proposed  as  alternates  to 
unbiased  least  squares  estimation. 

Countering  the  avowed  need  for  biased  regression  techniques, 
on  the  other  hand,  is  a dearth  of  computer  programs  in  the  stan- 
dard program  libraries  (BMDP  (Dixon,  1975),  SPSS  (Nie,  et  al. , 
1975),  etc.)  that  the  data  analyst  can  access  to  perform  the 
required  calculations.  Many  users  of  biased  regression  techniques, 
given  the  time  lag  between  the  advent  of  new  biased  regression 
procedures  and  the  introduction  of  appropriate  computer  software, 
are  forced  to  code  their  own  algorithms.  Most  of  these  users  are 
not  primarily  computer  programming  experts  but  acquire  sufficient 
knowledge  of  a programming  language  such  as  FORTRAN  to  be  able  to 
write  software  needed  in  their  research.  It  is  to  these  users 
that  this  article  is  addressed. 

The  general  theme  of  this  article  is  a discussion  of  similar- 
ities inherent  in  the  biased  estimators  listed  above  and  some  of 
the  more  useful  diagnostic  measures  as  well.  Biased  regression 
methodologies  employ  estimators  which,  although  appearing  quite 
different,  can  be  expressed  as  functions  of  common  variables. 

Some  of  these  estimators  are  so  similar  when  reexpressed  in  terms 
of  these  common  variables  that  several  authors  have  grouped  them 
into  "families"  (e.g. , Hocking,  Speed,  and  Lynn  (1976),  Gunst  and 
Mason  (1977b) ) . By  taking  advantage  of  the  mathematical  similar- 
ities of  the  estimators,  core  storage  requirements  and  computing 
time  can  be  lessened. 

2.  INPUT  / DIAGNOSTICS 

The  basic  input  to  a regression  program  is  an  (n  x 1)  raw 
response  vector,  Y* , and  an  (n  * p)  raw  data  matrix  of  predictor 
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variables,  X*  = [X?,.].  Large  core  requirements  can  be  necessitated 
if  Y*  and  X*  are  to  be  stored  and  retained  during  all  program  cal- 
culations. For  virtually  all  the  computations  except  the  calcula- 
tion of  residuals,  however,  only  summary  statistics  and  pairwise 
correlations  of  the  (p+1)  input  variables  are  needed.  Thus  only 
these  statistics  need  be  stored  by  the  program.  The  elements  of 
Y*  and  X*  can  be  stored  on  peripheral  mass  storage  devices  and 
only  called  for  during  initial  calculations  and  the  computation 
of  residuals;  when  not  needed,  the  arrays  can  be  returned  to  the 
peripheral  storage  units. 

It  is  well-documented  that  for  most  regression  computations 
some  form  of  standardization  is  desirable  (e.g.,  Marquardt  and 
Snee  (1975) ) . Let  Y and  X denote  the  "unit  length"  standardiza- 
tion of  Y*  and  X*: 


Y. 

l 


(YJ  - Y*)/dy 


X.  . 

13 


x*)/d 


j 


Y* 


-1 

n 


n 

l Y* 
1 


X?  = n 
3 


-1 


n 

1 

i=l 


X*  . 
1-3 


d = { I (Y*  - y*)2}1/2  d.  = { I (x*.  - x*)2}1/2. 

y i=l  1 3 i=1  13  3 


Arrays  containing  the  means,  Y*  and  X^ , root  sums  of  squared 
deviations,  dy  and  d..,  correlations  between  the  response  and  pre- 
dictor variables,  elements  of  X'Y,  and  correlations  between  pairs 
of  predictor  variables,  elements  of  X'X,  then  contain  the  informa- 
tion needed  for  the  calculation  of  biased  regression  estimators. 
These  arrays  also  contain  valuable  diagnostic  information  regard- 
ing associations  among  the  predictor  variables. 

Routinely,  the  means  and  standard  deviations  of  the  input 
variables  and  the  arrays  X'Y  and  X'X  should  be  output  for  regres- 
sion data.  The  means  and  standard  deviations  yield  summary  infor- 
mation about  the  location  and  dispersion  of  the  input  variables 
which  can  aid  in  assessing  whether  the  data  collected  is 


representative  of  the  process  or  phenomenon  under  study.  Pairwise 
correlations  indicate  the  strength  of  linear  associations  between 
two  variables.  In  particular,  large  pairwise  correlations  among 
the  predictor  variables  alert  the  user  to  the  possibility  of 
strong  multicollinearities  which  might  have  an  adverse  effect  on 
least  squares  estimation  and  variable  selection  techniques  (for  a 
survey  of  the  problems  associated  with  multicollinearities,  see 
Mason,  Gunst,  and  Webster  (1975)). 

Latent  roots  and  vectors  of  X'X  provide  additional  informa- 
tion on  multicollinearities,  particularly  multicollinearities 
involving  more  than  two  predictor  variables  (and,  as  we  shall  see 
in  the  next  section,  form  one  basis  for  the  expression  of  biased 
estimators  as  a family).  Define  the  latent  roots,  2^  < < ... 

< i.  , and  the  corresponding  latent  vectors,  V , V_,  ...,  V , of 

p I — ^ p 

X'X  by 

(X'X  - f,.I)V.  =0  j = 1,  2,  ...,  p . 

Latent  vectors  corresponding  to  latent  roots  that  are  near  zero 
identify  multicollinearities  among  the  predictor  variables. 
Specifically,  large  elements  of  these  latent  vectors  indicate 
which  variables  are  involved  in  multicollinearities  and  the  nature 
of  the  individual  multicollinearities  (for  a detailed  illustration 
of  the  use  of  latent  roots  and  vectors  in  the  detection  of  multi- 
collinearities see  Gunst  and  Mason  (1977a)). 

An  additional  diagnostic  measure  that  is  useful  in  assessing 
multicollinearities  is  the  variance  inflation  factor  (VIF)  of  each 
predictor  variable  (Marquardt  (1970),  Marquardt  and  Snee  (1975)). 
The  VIF  of  the  jth  predictor  variable  is  the  jth  diagonal  element 
of  (X'X)  If  X is  an  orthogonal  matrix  all  the  VIF  equal  1.0 

since  X'X  = (X'X)  1 = I,  the  (p  * p)  identity  matrix.  The  more 
m1’  :icollinear  the  predictor  variables,  the  larger  are  the  VIF  for 
the  variables  involved  in  the  multicollinearities.  Values  of  the 
VIF  larger  than  10,  or  even’  as  large  as  6,  indicate  strong  multi- 
collinearities and  potential  difficulties  with  least  squares 
estimation. 


Rather  than  computing  (X'X)  1 from  a separate  algorithm  in 
order  to  obtain  the  VIF,  the  latent  roots  and  vectors  of  X'X  can 
be  used  instead.  From  the  relationship 

P 

X'X  = VLV'  = l l V V'  , (2.1) 

r-r-r 

r=l 

it  follows  immediately  that 

-1  -1  P -1 

(x'x)  = vl  "V  = z a yry^.  , (2.2) 

r=l 

where  V = [V  , V_,  . . . , V 1 and  L = diagU. , . . . , i.  ) . Thus 

-I  - z - p l 2 p 

if  C = (X'X)  , the  jth  VIF  is  given  by 


P 

Z 

r=l 


(2.3) 


By  taking  advantage  of  the  mathematical  property  (2.1),  there  is 
no  need  to  compute  nor  store  (X'X)  1 once  the  latent  roots  and 
vectors  of  X'X  are  obtained. 

Other  informative  summary  and  diagnostic  information  such  as 
the  minimum  and  maximum  of  each  input  variable,  two  variable 
plots,  or  measures  of  how  influential  each  data  point  is  on  the 
estimation  of  the  regression  coefficients  (e.g.  Cook  (1977)) 
could  also  be  computed  or  available  as  optional  output.  Any  or 
all  of  these  diagnostic  measures  could  be  indispensable  for  proper 
analysis  and  interpretation  of  a regression  data  set.  All  should 
be  available  to  the  user. 


3.  ESTIMATORS 

The  five  estimators  mentioned  in  the  Introduction  are  defined 
mathematically  in  the  following  equations , all  of  which  employ 
standardized  input  variables.  Least  squares  (LS)  estimators  are 
given  by 

B = (X'X)-1X'Y*  = d (X'X)_1X'Y  . 

—LS  “ y “ 


(3.1) 


For  some  k > 0,  (simple)  ridge  regression  (RR)  estimators  can  be 
written  as 


= d (X'X  + kl) 

y 


(3.2) 


A principal  component  (PC)  estimator  which  deletes  the 
components  (obvious  alterations  can  be  made  if  subsets 
the  first  s are  to  be  deleted)  can  be  obtained  as 

§pC  = dy (X'X) +X 1 Y , 

where  (X'X)+  = VL+V'  and  L+  = diag(0,  0,  0,  fc_1, , 

^ s+ 1 

^ ) • Shrunken  estimators  (SE)  can  be  calculated  by 


first  s 
other  than 


(3.3) 


-SE  = 9§LS  = gdy(X-X)_1X'Y 


(3.4) 


where  0 < g < 1.  Finally,  latent  root  estimators  (LR)  are  func- 
tions of  the  latent  roots,  X < X,  < ...  < X , and  the  corre- 

o ~ 1 - - p 

sponding  latent  vectors,  y , y _,...,  y , of  the  (p+1)  by  (p+1) 

-o  -1  -p 

matrix  A'A,  where  A = [ Y : X] . (This  matrix  is  already  available 
from  the  initial  arrays  since 


A'A 


1 Y'X 


X'Y  X’X 


and  the  same  algorithm  used  to  calculate  the  latent  roots  and 

vectors  of  X'X  can  be  used  to  calculate  those  of  A'A).  For  ease 

of  notation  let  y\  = (y  . :6! ) where  6'.  = (y,  . , y_  . , ...,  y .). 

-3  °D  -g  'll  ’2]'  'pi' 

Then  the  latent  root  estimator  can  be  written  as 


BTD  = d E f 6 
-LR  y r r-r 


(3.5) 


where  f = 
r 


-1  2-1 

y X /(Ey  X ) and  the  summations  are  taken  overall 
or  r q oq  q 


subscripts  for  which  y^  and  X^  are  not  simultaneously  close  to 


zero. 


Equations  (3.1)  to  (3.5)  appear  to  indicate  that  several 
matrix  inversions  and  large  storage  requirements  are  needed  to 
calculate  all  the  biased  estimators  listed.  Actually,  apart  from 


the  initial  arrays  mentioned  in  Section  2,  only  the  latent  roots 
and  vectors  of  X'X  and  A'A  need  be  computed  and  stored.  All  five 
estimators  can  be  expressed  in  the  general  form 


6 = d z h m 
y r r-r 


(3.6) 


where  the  h^  are  appropriately  defined  univariate  variables  and 


the  m^  are  latent  vectors  of  either  X'X  or  A'A. 
and  m^  are  defined  as  follows  for  the  five  estimators: 


Specifically,  h 


LS: 

u 

>1 

II 

u 

El 

h = 
r 

£-1V'X'Y 
r-r  - 
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— 1, 2 , . . . ,p 

RR: 
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— 1 » 2 i • • • ,p 

PC: 

m = v , 
-r  -r 

\ =1 

1 0 

1 

r 
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SE: 

m = v , 
-r  -r 

h = 
r 

g£_1V'X'Y 

r-r 

r 

= 1 9 2 f • • • * p 

LR: 

m = <5  , 

-r  -r 

h = 
r 

1 0 

Y ^ 0 and  X £ 0 
or  r 

( fr 

otherwise  . 

Not  only  are  large  core  storage  requirements  reduced  by 

using  (3.6)  and  (3.7)  since  (X'X)  1,  (X'X  + kl)”1,  and  (X'X)+  do 

not  need  to  be  retained,  but  computing  time  is  shortened  in  at 

least  two  ways.  First,  V(X'Y  appears  in  several  of  the  h^  in 

(3.7)  but  each  of  these  p variables  need  only  be  computed  once. 

Secondly,  if  one  wishes  to  examine  several  choices  of  k for  RR  or 

several  selections  of  s for  PC,  for  example,  repeated  calculation 
-1  + ~ 

of  (X'X  + kl)  and  (X'X)  and  then  6 and  through  (3.2)  and 

~ RR  — P v- 

(3.3)  need  not  be  accomplished.  It  is  computationally  quite  simple 
and  relatively  fast  to  alter  k and  s in  (3.7)  and  calculate  the 
estimators  using  (3.6). 


4.  CONCLUDING  REMARKS 

Other  useful  statistics  such  as  variable  selection  measures 
can  be  expressed  uniformly -just  as  the  estimators  in  the  previous 
section.  One  should  seek  such  expressions  when  writing  statistical 


software  in  order  to  take  advantage  of  reduced  storage  and  com- 
puting time  capabilities.  Not  only  will  reductions  in  storage 
and  computing  time  result  in  monetary  savings,  but  the  data 
analyst  will  find  that  the  computer  programs  so  written  will  also 
be  able  to  process  much  larger  data  sets  than  if  the  suggestions 
made  in  this  paper  were  not  followed.  Several  hundred  observations 
on  a moderate  amount  of  predictor  variables  can  be  a prohibitively 
large  number  if  Y*,  X*,  ( X 1 X)  \ (X'X  + kl)  \ etc.  must  be  stored 

for  each  computing  run. 
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